Forest-Based Semantic Role Labeling

نویسندگان

  • Hao Xiong
  • Haitao Mi
  • Yang Liu
  • Qun Liu
چکیده

Parsing plays an important role in semantic role labeling (SRL) because most SRL systems infer semantic relations from 1-best parses. Therefore, parsing errors inevitably lead to labeling mistakes. To alleviate this problem, we propose to use packed forest, which compactly encodes all parses for a sentence. We design an algorithm to exploit exponentially many parses to learn semantic relations efficiently. Experimental results on the CoNLL-2005 shared task show that using forests achieves an absolute improvement of 1.2% in terms of F1 score over using 1-best parses and 0.6% over using 50-best parses. Introduction Semantic role labeling (SRL) is consider to be an important task toward natural language processing, and has been recently used in kinds of natural language applications, such as Information Extraction (Surdeanu et al. 2003), Question and Answering (Shen and Lapata 2007) , Machine Translation (Wu and Fung 2009), Coreference Resolution (Kong et al. 2008) and so on. Given a sentence, the goal of SRL is to assign semantic roles (arguments) to syntactic constituents for each target verb (predicate). Arguments usually include Agent, Patient, Instrument, etc. and also adjuncts such as Locative, Temporal, Manner, Cause, etc. For an overview of semantic role labeling, readers can refer to Màrquez (2009). Generally, semantic role labeling consists of two steps: identifying and classifying arguments. The former step involves assigning either a semantic argument or nonargument to syntactic element, while the latter includes giving a special semantic role for identified argument. To distinguish the different semantic roles, most previous work map one argument to one syntactic constituent and then extract effective features for the syntactic constituent. Punyakanok, Roth, and tau Yih (2005) shows, most systems rely heavily on the full syntactic parse trees. And because of error propagation and amplification through the chained modules, the overall performance of the system is largely determined by the quality of the automatic syntactic parsers. However, to our best of knowledge, previous reported work employed only 1-best parses or lists of k-best parses, Copyright c © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Charniak Parser Collins Parser AM-MOD 10,099 10,112 A1 9,162 11,327 AM-NEG 3,556 3,560 A0 2,760 3,812 AM-DIS 1,629 1,651 Table 1: The top five arguments, which map to many syntactic constituents obtained with the Charniak and Collins Parsers. (A0 and A1 are two normalized arguments and usually viewed as subject and object in one sentence. AMMOD, AM-NEG and AM-DIS are three adjuncts indicating the modal verbs,negation particles and clauses, respectively.) with limited derivations and variations, those syntactic parsing results will inevitably affect the performance of SRL. For example, most of traditional systems firstly map one argument to one syntactic constituent in the parser trees, however, according to our statistics, more than 15% arguments map to many syntactic constituents in the full data of CoNLL-20051 shared task(Carreras and Màrquez 2005) because of the parsing errors. Table 1 shows the results of two state of the art Charniak (Charniak 2001) and Collins (Collins 1999) parsers, and the full data still includes more than forty hundred arguments with one-to-many mapping in their corresponding syntactic parser trees. To alleviate the effects of parsing errors and express more derivations of the parser tree, we employ a packed forest, which almost includes all derivations of parser trees. The employ of packed forest is mainly inspired from the work of (Mi, Huang, and Liu 2008), who also use a packed forest to weaken the impact of parsing errors in a machine translation system. Nevertheless, using packed forest for semantic role labeling, to our knowledge, is the first time. We first extract useful features over forest, and then use a max-entropy classifier to identify and classify the semantic role in one step. 2 Experimental results on the CoNLL-2005 shared task show http://www.cnts.ua.ac.be/conll2005/ Although support vector machine is more effective for SRL, max-entropy classifier is easier to handle multi-class classification problems and run faster. that our approach significantly improves the performance of SRL system and achieves an absolute improvement of 1.2% in F1 score over the 1-best system. We first briefly describe the previous works of semantic role labeling and review traditional tree-based approach. Then we mainly describe our forest-based model. Finally we present experimental results of different methods and conclude our work. Semantic Role Labeling Semantic role labeling plays an important role in natural language processing. Given a sentence, the goal of SRL is to identify argument of each target verb and then classify identified argument into different semantic role. For example, given a sentence “The economy ’s temperature will be taken from several vantage points this week”, the goal of SRL is to identify different arguments for the verb take which yields the following output: [A1 The economy ’s temperature][AM−MOD will] be [V taken] [A2 from several vantage points] [AM−TMP this week]. where A1 represents the thing taken, A2 represents the entity taken from, AM-MOD is an adjunct indicating the modal verb, AM-TMP is also an adjunct indicating the timing of the action and V determines the verb. Generally, arguments such as A1, A2, etc. have different semantics for each target verb that have specified in the PropBank(Kingsbury and Palmer 2002) Frame files. Moreover, each argument can find a constituent in the corresponding full syntactic parse tree. For more definitions of PropBank, readers can refer to (Kingsbury and Palmer 2002; Palmer, Gildea, and Kingsbury 2005). The work (Gildea and Jurafsky 2002), who used some basic features such as Phrase Type, Governing Category, Parse Tree Path, etc. and employed an interpolation method to identify and classify the syntactic constituents in the FrameNet (Baker, Fillmore, and Lowe 1998), can be viewed as the first work of automatic semantic role labeling. Following this work, some excellent works focused on exploiting additional features(Pradhan et al. 2003; Chen and Rambow 2003; Xue and Palmer 2004; Jiang, Li, and Ng 2005), employing effective machine learning models(Nielsen and Pradhan 2004; Punyakanok et al. 2004; Pradhan et al. 2004; Moschitti 2004; Pradhan et al. 2005a; Zhang et al. 2007), using different syntactic views(Gildea and Hockenmaier 2003; Pradhan et al. 2005b), robust labeling(Pradhan, WayneWard, and H.Martin 2008) and finding similar verbs(Andrew and Reid 2007; Pennacchiotti et al. 2008), etc. In addition, some works focused on semisupervised(Thompson 2004; Deschacht and Moens 2009; Fürstenau and Lapata 2009a; Fürstenau and Lapata 2009b) or unsupervised semantic role labeling(Swier and Stevenson 2004; Rappoport1 2009). Moreover, semantic role labeling became a well-defined shared task at the CoNLL 2004, 2005 and 2008 conferences. Most of those previous works can be viewed as tree-based SRL, since they only take as input 1-best or k-best parse trees, which inevitably affect the performance of SRL due S0,9

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VHR Semantic Labeling by Random Forest Classification and Fusion of Spectral and Spatial Features on Google Earth Engine

Semantic labeling is an active field in remote sensing applications. Although handling high detailed objects in Very High Resolution (VHR) optical image and VHR Digital Surface Model (DSM) is a challenging task, it can improve the accuracy of semantic labeling methods. In this paper, a semantic labeling method is proposed by fusion of optical and normalized DSM data. Spectral and spatial featur...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Convolution Kernel over Packed Parse Forest

This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more ro...

متن کامل

XARA: An XML- and Rule-based Semantic Role Labeler

XARA is a rule-based PropBank labeler for Alpino XML files, written in Java. I used XARA in my research on semantic role labeling in a Dutch corpus to bootstrap a dependency treebank with semantic roles. Rules in XARA are based on XPath expressions, which makes it a versatile tool that is applicable to other treebanks as well. In addition to automatic role annotation, XARA is able to extract tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010